Team ID: Team 6

NAME: Connor Rosenberg

NAME: Nassim Ali-Chaouche

NAME: Rongkui Han

NAME: Yuqing Yang


Introduction

Background

The Student/Teacher Achievement Ratio (STAR) was a four-year longitudinal class-size study funded by the Tennessee General Assembly and conducted by the State Department of Education. Over 7,000 students from kindergarten to 3rd grade in 79 schools were randomly assigned into one of three interventions: small class (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regular-with-aide class (22 to 25 students with a full-time teacher’s aide). Classroom teachers were also randomly assigned to the classes they would teach. The interventions were initiated as the students entered school in kindergarten and continued through third grade.

The STAR experiment was designed by a group of researchers including Helen Pate-Bain, the driving force behind Project STAR, other academics, and members of the Tennessee Department of Education. Some of its key features are:

  1. All Tennessee schools with K-3 classes were invited to participate. Giving every school a chance to join the study helped ensure a diverse sample as well as rule out the possibility that class-size effects could be attributed to selection bias.

  2. Each school included in the study had to have a large enough student body to form as least one of each of the three class types. The within-school design provided built-in control for differences among schools in terms of resources, leadership, and facilities.

  3. Schools from inner-city, urban, suburban and rural locations were included in the experiment. This feature guaranteed that samples would include children from various ethnic backgrounds and income levels.

  4. Students and teachers were randomly assigned to their class type.

  5. Investigators followed the standard procedures for confidentiality in human subjects’ research.

  6. No children were to receive fewer services than normal because of the experiment.

  7. Student achievement was to be tracked by standarized tests, which were carefully monitored.


Questions of Interest

For our analysis, we will consider the following qustions.

  1. Is there a significant difference in the mean math scores in the 1st grade across different class types?
  2. Are the assumptions of the ANOVA model satisfied? To explore this question, we will use model diagnostics such as residual plots and a Levene test for equality of variances.
  3. Is there a significant difference in the mean math scores in kindergarten across different class types?

Methods & Analysis

Descriptive Analysis

There are 11,598 rows and 47 columns in the STAR dataset. Each row represents a student that participated in the experimental phase for at least one year. Each column represents information including demographic variables. This variables inlcude school type, class type, and scores from different subject tests. For our analysis, we will primarily focus on the type of class and the scaled math score of first graders.

School Type

The project include four types of schools to assess the effects of class size in differnet school locations. They are inner-city, suburban, urban, and rural schools. Inner-city and surburban schools were located in metropolitan areas, and urban or rural schools were located in non-metropolitan areas.

According to the box plot, students in suburban and rural schools performed better on the total math scaled score in 1st grade. Besides, the bar plot shows that in all types of schools, students in small classes performed slightly better than students in regular classes or regular classes with a full-time teacher aide. However, the effect of school type is not very significant in this plot, and further analysis might be needed.

Class Type

Considering that some schools might drop out of the project, or the class type might be changed if schools gain or lose students, here I only select students that had full records of math scaled scores, and remained in the same type of class from kindergarten to the 3rd grade.

According to the line plot, the performance of students in regular classes and the performance of students in regular classes with a full-time tearcher aide are not very different. However, the students in small classes performed better on the total math scaled score from kindergarten to the 3rd grade.

One-Way ANOVA Model

For our analysis, we will use a variation of the factor effects model to capture the effect of the differet class treatments. This model is most useful in our analysis becasure it represents the expected change in math score compared to a studnet enrolled in the standard regular class treatment. As showed above, the the study effectivly randomized many of the different demographic features of particiapnts. This randomization allows us to use this simplified model to generalize the impact of class sizes on test scores across all demogrphics. Moreover, since our primary question examines the effect of class size on math scores, defining the model in this way allows us to examine the effect of each treatment compared to the regular class treatment.

The The final model selected in this analysis is described below.

\[MathScore = \mu +T_2X_2 + T_3X_3 \] where:

\(X_2 = 1\) if the stduent is assigned to a small class and \(0\) otherwise. \(X_3 = 1\) if the stduent is assigned to a regular + aid class and \(0\) otherwise.

In this model, \(\mu\) represnets the average first-grade math test-score for studnets enrolled in regular classes. Furthermore, \(T_2\) represents the average change in score for studnets enrolled in small classes and \(T_3\) represnets the average change in score for studnets enrolled in regualr+aid clases, both compared to the regular class treatment.

Before fitting the model described above, it is imporatnt to examine the model assumptions.

  1. Normailty

Examining the histogram of math scores across each of the three different classroom treatments, we can see that the scores of each treatment appear to follow a normal-like distribution. We can also note that each treatment distribution has slightly heavy left-tails; however, they do not deviate too much from our model assumptions and therefore, we may continue without transformating our data to fit this assumption.

  1. Independence

It is not reasonable to assume a strict independce of our samples through this study. Studnets may study together, teachers may share materials, or parents may seek out tutoring for thier child. However, this study represnets the most controlled and rhobust method we could possible obtain to examine the effects of class size of math test scores of first graders. Paired with the large sample size, while these smaples are not strictly independnet we can consider them practically indepenent since the study represents the most controlled enviorment we could possible obtain.

  1. Equality of Variance

From the histograms pictured above, it does appear that the distribution of first-grade test scores across the three treatments have the same varaince. And after specifying an alpha of \(.o5\) The lavene test for euality of variances did not return significnat evidnce that the variances between grops was significantly different. Therefore, we may assume equaltiy of variances between treatment groups.

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)  
## group    2   2.826 0.05932 .
##       6595                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With all three of our model assumptions verrifeid, we can continue to fit our ANOVA model.

Model Results

After fitting the cell means model described above, we found significnat evidence to suggust that at least one of the classroom treatments exhibited a different score distribtion compared to that of regular class size.

## Analysis of Variance Table
## 
## Response: math1
##             Df   Sum Sq Mean Sq F value    Pr(>F)    
## star1        2   194888   97444  53.263 < 2.2e-16 ***
## Residuals 6595 12065412    1829                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both the Regular+Aide & Small class sizes returned an increase in average math scores compared to the regualr class treatment group. The average score for students in regular classes was 525.28. Students enrolled in Regular + Aide classes perfomred on average 4.35 scaled points higher than regular class students. Similarly, Students enrolled in Regular + Aide classes perfomred on average 13.40 scaled points higher than regular class students.

Class.Size Average.Math.Score Change.from.Regular Significant
Regular 525.28
Regular + Aide 529.63 4.35 YES
Small 538.68 13.4 YES

Model Diagnostic

Before validating out model, we must return to our assumptions and ensure that they were not violated.

  1. Normailty

From the normal Q-Q plot below, we can observe that our specified model follows the expected Q-Q quite well. While there are some minor deviations at the tails, which we observed earlier in the the Histogram, the overall trend of the data demonstrates a largly normal distribution. Therefore, we can satsify the normality assumption.

  1. Independence

From the Residual vs Fitted plot below, it is clear that our data is independnet and randomly distributed about zero. Therefore, we can satisfy our assumption of independnce.

  1. Equality of Variance

Finally, the Residual vs Fitted plot above shows that the distribution of residuals in each of the three treatments is indistinguishable. Paired with our earlier test for the equality of variance, we can satsify this assumption.

With all three model assumptions supported by residuals, we move forward and validate this model. That is to say, this model is behaving the way we expect it to and is therefore a useful and informative model.


Further Tests

Since there is no violation for ANOVA model assumptions, an F-test was used to test whether there is a difference in the math scaled score in 1st grade across students in different type. The hypothesis was shown as below.

\(H_0: \mu1 = \mu2 = \mu3\) v.s \(H_a: not\ all\ \mu_i's\ are\ equal\)

Since \(F^* = \frac{MSTR}{MSE} = 53.26 > F(1-\alpha; r, n_T-r) = 2.997093\), we reject the null hypothesis at level of 0.05.

To further investigate the difference among the class types, multiple comparisons were used. The hypothesis was shown as below.

\(H_0: D_{ij} = \mu_i-\mu_j, 1\leq i < j \leq3\)

The multipliers of Bonferroni’s procedure, Tukey’s procedure, and Scheffé’s procedure were shown in the table below. It is shown that the Scheffé’s multiplier is the smallest. So the Scheffé’s procedure was used to contrust the confidence intervals.

Procedure Multiplier
Bonferroni 2.39459079786542
Tukey 2.34423094428225
Scheffé 2.2830924522205

According the Confidence Intervals, zero is not contained in all the C.I.s. Therefore, at the family-wise significance level 0.05, we would reject all the null hypotheses.

\(D_{ij}\) \(\hat D_i\) \(CI\)
\(\mu_1 - \mu_2\) -13.3982890749551 [ -16.38357 , -10.41301 ]
\(\mu_2 - \mu_3\) 9.0525616533937 [ 5.988097 , 12.11703 ]
\(\mu_1 - \mu_3\) -4.34572742156138 [ -7.190509 , -1.500946 ]

Our ANOVA model and confidence intervals all demonstrated significnat evidnece that the scaled math scores for first grade studnets in small and regular + aide classes were both significnalty higher than the scaled math scored for studnts enrolled in regular classes.


Conclusions